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ABSTRACT 

This paper studies convolutional neural networks (CNN) to 
learn unsupervised feature representations for 44 different 
plant species, collected at the Royal Botanic Gardens, Kew, 
England. To gain intuition on the chosen features from the 
CNN model (opposed to a ’black box’ solution), a visualisa¬ 
tion technique based on the deconvolutional networks (DN) 
is utilized. It is found that venations of different order have 
been chosen to uniquely represent each of the plant species. 
Experimental results using these CNN features with different 
classifiers show consistency and superiority compared to the 
state-of-the art solutions which rely on hand-crafted features. 

Index Terms — plant classification, deep learning, feature 
visualisation 

1. INTRODUCTION 

Plants are the backbone of all life on earth providing us with 
food and oxygen. A good understanding of plants is essen¬ 
tial to help in identifying new or rare plant species in order to 
improve the drug industry, balance the ecosystem as well as 
the agricultural productivity and sustainability |[T| . Amongst 
all, botanists use variations on leaf characteristics as a com¬ 
parative tool for their study on plants 00 - This is because 
leaf characteristics are available to be observed and examined 
throughout the year in deciduous, annual plants or year-round 
in evergreen perennials 

In computer vision, despite many efforts (i.e with 
sophisticated computer vision algorithms) have been con¬ 
ducted, plant identification is still considered a challenging 
and unsolved problem. This is because a plant in nature has 
very similar shape and colour representation as illustrated 
in Eig. Kumar et al. proposed an automatic plant 
species identification system namely Leaf snap. They identi¬ 
fied plants based on curvature-based shape features of the leaf 
by utilizing integral measure to compute functions of the cur¬ 
vature at the boundary. Then, identification is done by nearest 
neighbours (NN). Other solutions employed geometric 0, 
multi-scale distance matrix, moment invariants colour, 
texture @§ and venation features to identify a plant. 



Fig. 1: Sample of the 44 plant species employed in this paper. 
It can be noticed that all the plant species have almost similar 
colour representation and shape. 

Although successful, one must note that the performance of 
these aforementioned solutions is highly dependent on the 
chosen set of features which are task or dataset dependent. 
That is, it may suffer from the dataset bias problem pQ| . 

In this paper, we propose to employ deep learning in a 
bottom-up and top-down manner for plant identification. In 
the former, we choose to use a convolutional neural networks 
(CNN) model to learn the leaf features as a means to perform 
plant classification. In the latter, rather than using the CNN 
as a black box mechanism, we employ deconvolutional net¬ 
works (DN) to visualize the learned features. This is in order 
to gain visual understanding on which features are important 
to identify a leaf from different classes, thus avoiding the ne¬ 
cessity of designing hand-crafted features. Empirically, our 
method outperforms state-of-the-art approaches |[^|^[TT| us¬ 
ing the features learned from CNN model in classifying 44 
different plant species. 

This paper presents two contributions. Eirst, we propose 
a CNN model to automatically learn the features represen¬ 
tation for plant categories, replacing the need of designing 
hand-crafted features as to previous approaches |[3l[9l p^p^ . 
Second, we identify and diagnose the feature representation 
learnt by the CNN model through a visualisation strategy 
based on the DN. This is to avoid the use of the CNN model 
as a black box solution, and also provide an insight to re- 
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Fig. 2: Architecture of our CNN model for plant identification. 
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Fig. 3: Our deep learning framework in a bottom-up and top- 
down manner to study and understand plant identification. 


searchers on how the algorithm "see" or "perceives" a leaf. 
Finally, a new leaf dataset, named as MalayaKew (MK) Leaf 
Dataset is also collected with full annotation. 

The rest of the paper is organized as follows: Section 
reviews the concept of deep learning, in particular our CNN 
and DN model for plant identification. Section [^presents our 
findings and a comparison with conventional solutions. Fi¬ 
nally, conclusions are drawn in Section]^ 

2. PROPOSED APPROACH 

In this section, we first explain how we employ the pre-trained 
CNN model to perform plant identification. Then, we detail 
how a DN model is utilised with our new visualisation strat¬ 
egy, to understand how the CNN model work in identifying 
different plant species. Fig. [^depicts the overall framework 
of our approach. 


2.1. Convolutional Neural Network 

The CNN model used in this paper is based on the model pro¬ 
posed in GD with ILSVRC2012 dataset used for pre-training. 
Rather than training a new CNN architecture, we re-used the 
pre-trained network due to a) recent work GD reported that 
features extracted from the activation of a CNN trained in 
a fully supervised manner on large-scale object recognition 
works can be re-purposed to a novel generic task; 2) our train¬ 
ing set is not large as the ILSVRC2012 dataset. Indicated 
in the performance of the CNN model is highly depend¬ 
ing on the quantity and the level of diversity of training set, 
and finally c) training a deep model requires skill and experi¬ 
ence. Also, it is time-consuming. 

For our CNN model, we perform fine-tuning using a 44 
classes leaf dataset collected at the Royal Botanic Gardens, 
Kew, England. Thus, the final fully connected layer is set to 
have 44 neurons replacing the original 1000 neurons. The 
full model of our CNN architecture is depicted in Fig. The 
first convolutional layer filters the 227 x 227 x 3 input leaf im¬ 
ages with 96 kernels of size 11x11x3 with stride of 4 pix¬ 
els. Then, the second convolutional layer takes the pooled 
feature maps from the first layer and convolved with 256 fil¬ 
ters of size 5x5x48. Following this, the output is fed to the 
third and later to the fourth convolutional layer. The third and 
fourth convolutional layers which have 384 kernels of size 
3 X 3 X 256 and 384 kernels of size 3 x 3 x 192 respectively per¬ 
form only convolution without pooling. The fifth convolu¬ 
tional layer has 256 kernels of size 3 x 3 x 192. After perform¬ 
ing convolution and pooling in the fifth layer, the output is 
fed into fully-connected layers which have 4096 neurons. For 
the parameter setting, the learning rate multiplier of the filters 
and biases are set to 10 and 20, respectively. 
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Fig. 5 : Failure analysis on our proposed CNN model in Dl. 


Fig. 4: Visualization (VI) strategy to understand how and 
why our CNN works/fails. Best viewed in colour. 


2.2. Deconvolutional Network 

The CNN model learns and optimises the filters in each layer 
through the back propagation mechanism. These learned fil¬ 
ters extract important features that uniquely represent the in¬ 
put leaf image. Therefore, in order to understand why and 
how the CNN model operates (instead of treating it as a "black 
box"), filter visualisation is required to observe the transfor¬ 
mation of the features, as well as to understand the internal 
operation and the characteristic of the CNN model. More¬ 
over, we can identify the unique features on the leaf images 
that are deemed important to characterize a plant from this 
process. introduced multi-layered DN that enable us 

to observe the transformation of the features by projecting the 
feature maps back to the input pixel space. Specifically, the 
feature maps from layer n are alternately deconvolved and un¬ 
pooled continuously down to input pixel space. That is, given 
the feature maps, as: 
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where layer I be a deconvolutional layer and K are the filters. 

To visualize our CNN model, we employ a strategy named 
as VI based on the DN approach fTTlp^ . The purpose of VI 
is to examine the overall highest activation parts across all fea¬ 
ture maps for that layer L So that, through the reconstructed 
image, we could observe the highly activated regions of the 
leaf in that layer. In order to do that, for all the absolute ac¬ 
tivations in that layer n, we consider only the first S largest 
pixel value with the rest are set to zero and projected down to 
pixel space to reconstruct an image as defined: 
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Fig. 6: Failure analysis on our proposed CNN model in D2. 


where S' = 1, 2,., size(Yj^^). With this, we could observe 

the highly activated regions of the leaf in that layer. The visual 
results of S = 1, S = 5 and S = ’AIT are illustrated in Fig. 

3. EXPERIMENTAL RESULTS 

3.1. Data Preparation 

A new leaf dataset, named as MalayaKew (MK) Leaf Dataset 
which consists of 44 classes, collected at the Royal Botanic 
Gardens, Kew, England are employed in the experiment. 
Samples of the leaf dataset is illustrated in Fig. and we 
could see that this dataset is very challenging as leaves from 
different classes have very similar appearance. A data (Dl) 
is prepared to compare the performance of the trained CNN. 
That is, we use leaf images as a whole where in each leaf 
image, foreground pixels are extracted using the HSV colour 
space information. To enlarge the Dl dataset, we rotate the 
each leaf images in 7 different orientations, e.g. 45°, 90°, 
135°, 180°, 225°, 270° and 315°. We then randomly select 
528 leaf images for testing and 2288 images for training. 

3.2. Results and Failure Analysis - Dl 

In this section, we present a comparative performance eval¬ 
uation of the CNN model on plant identification. From Ta¬ 
ble it is noticeable that using the features learnt from the 
CNN model (98.1%) outperforms state-of-the-art solutions 
|3|9|11| that employed carefully chosen hand-crafted features 
even when different classifiers are used. We performed failure 
analysis and observed that most of the misclassified leaves are 
from Class 2(4 misclassified), follow by Class 23(3), Class 9 
& 27(2 each), and Class 38(1). From our investigation as il¬ 
lustrated in Fig. the Q. roburf purpurascens (i.e Class 2) 















Fig. 7: Feature visualisation using DN. It shows that shape (feature) is chosen in Dl. Best viewed in colour. 
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Fig. 8: Feature visualisation using DN. It shows that venation and the departure between different order venations (feature) are 
chosen in D2. Best viewed in colour. 


were misclassified as Q. acutissima (i.e Class 9) , Q. rubra 
'Aurea’ (i.e. Class 27) and Q. macranthera (Class 39), re¬ 
spectively; have almost the same outline shape as to Class 2. 
The rest of the misclassified testing images are also found to 
be misled by the same reason. 

In order to further understand how and why the CNN fails, 
here we delve into the internal operation and behaviour of 
the CNN model via V1 strategy. We evaluate the one largest 
pixel value across the feature maps. Our observation from the 
reconstructed images in Fig [ 7 ] shows that the highly activated 
parts fall at the shape of the leaves. So, we deduce that leaf 
shape is not a good choice to identify plants. 

3.3. Results and Failure Analysis - D2 

Here, we built a variant dataset (D2), where we manually crop 
each leaf image in the Dl into patches within the area of the 
leaf (so that no shape is included). This investigation is two¬ 
fold. On one hand, we intend to know what is the precision 
of the plant identification classifier when the leaf shape is ex¬ 
cluded ? On the other hand, we would like to find out if plant 
identification could be just done by patch of the leaf. Since 


Table 1: Performance Comparison on the MK Leaf Dataset 
with Different Classifiers. Note that, MLP = Multilayer Per- 
ceptron, SVM = Support Vector Machine, and RBF = Radial 
Basis Function. 


Feature 

Classifier 

Accuracy (%) 

From Deep CNN (Dl) 

MLP 

0.977 

From Deep CNN (Dl) 

SVM (linear) 

0.981 

From Deep CNN (D2) 

MLP 

0.995 

From Deep CNN (D2) 

SVM (linear) 

0.993 

LeafSnap f 

5 

SVM (RBF) 

0.420 

LeafSnap f 


NN 

0.589 

HCF0 


SVM (RBF) 

0.716 

HCF-ScaleRobi 

ist0 

SVM (RBF) 

0.665 

Combine [S 

3 

Sum rule (SVM (linear)) 

0.951 

SIFT jm 


SVM (linear) 

0.588 


the original images range from 3000 x 3000 to 500 x 500, 
three different leave patch sizes (i.e 500 x 500, 400 x 400 
and 256 X 256) are chosen. Similarly, we increase the diver¬ 
sity of the leaf patches by rotating them it in the same manner 
as to Dl. We randomly select 8800 leaf patches for testing 













































































































































and 34672 leaf patches for training. 

In Table we can see that the classification accuracy of 
the CNN model, trained using D2 (99.6%), is higher than us¬ 
ing D1 (97.7%). Again, we perform the visualisation via VI 
strategy as depicted in Fig. [^to understand why the CNN 
trained with D2 has a better performance. From layer to layer, 
we notice that the activation part falls on not only the pri¬ 
mary venation but also on the secondary venation and the 
departure between different order venations. Therefore, we 
could deduce that venation of different orders are more ro¬ 
bust features for plant identification. This also agrees with 
some studies |[^|^ highlighting that quantitative leaf vena¬ 
tion data have the potential to revolutionize the plant identi¬ 
fication task. Existing work that had employed venation to 
perform plant classification are ||^[^[T^|^[^. However, as 
opposed to these solutions, we automatically learn the vena¬ 
tion of different orders, while they use a set of heuristic rules 
that are hard to replicate. 

We also analysed the drawbacks of our CNN model with 
D2 and observe that most of the misclassified patches are 
from Class 9(18 misclassified), follow by Class 2(13), Class 
30(5), Class 28(3) and Class 1 , 31 & 42(1 each). The con¬ 
tributing factor of the misclassification seems to be the condi¬ 
tion of the leaves, where the samples are noticeable affected 
by environmental factors such as wrinkled surface and insect 
damages. Example of such conditions are shown in Fig. 

4. CONCLUSION 

This paper studied a deep learning approach to learn discrim¬ 
inative features from leaf images with classifiers for plant 
identification. From the experimental results, we justified that 
learning the features through CNN can provide better feature 
representation for leaf images compared to hand-crafted fea¬ 
tures. Moreover, we demonstrated that venation structure is 
an important feature to identify different plant species with 
performance of 99.6%, outperforming conventional solutions. 
This is verified by analysing the internal operation and be¬ 
haviour of the network through DN visualisation technique. 
In future work, we will extend the work to recognize in the 
wild. 
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